Overview

Dataset statistics

Number of variables40
Number of observations445132
Missing cells902665
Missing cells (%)5.1%
Duplicate rows145
Duplicate rows (%)< 0.1%
Total size in memory135.8 MiB
Average record size in memory320.0 B

Variable types

Text1
Categorical11
Numeric6
Boolean22

Alerts

Dataset has 145 (< 0.1%) duplicate rowsDuplicates
HadHeartAttack is highly imbalanced (68.5%)Imbalance
HadAngina is highly imbalanced (67.2%)Imbalance
HadStroke is highly imbalanced (74.2%)Imbalance
HadSkinCancer is highly imbalanced (59.7%)Imbalance
HadCOPD is highly imbalanced (59.6%)Imbalance
HadKidneyDisease is highly imbalanced (73.2%)Imbalance
HadDiabetes is highly imbalanced (59.9%)Imbalance
DeafOrHardOfHearing is highly imbalanced (55.8%)Imbalance
BlindOrVisionDifficulty is highly imbalanced (68.9%)Imbalance
DifficultyDressingBathing is highly imbalanced (75.8%)Imbalance
DifficultyErrands is highly imbalanced (60.7%)Imbalance
HighRiskLastYear is highly imbalanced (74.2%)Imbalance
PhysicalHealthDays has 10927 (2.5%) missing valuesMissing
MentalHealthDays has 9067 (2.0%) missing valuesMissing
LastCheckupTime has 8308 (1.9%) missing valuesMissing
SleepHours has 5453 (1.2%) missing valuesMissing
RemovedTeeth has 11360 (2.6%) missing valuesMissing
DeafOrHardOfHearing has 20647 (4.6%) missing valuesMissing
BlindOrVisionDifficulty has 21564 (4.8%) missing valuesMissing
DifficultyConcentrating has 24240 (5.4%) missing valuesMissing
DifficultyWalking has 24012 (5.4%) missing valuesMissing
DifficultyDressingBathing has 23915 (5.4%) missing valuesMissing
DifficultyErrands has 25656 (5.8%) missing valuesMissing
SmokerStatus has 35462 (8.0%) missing valuesMissing
ECigaretteUsage has 35660 (8.0%) missing valuesMissing
ChestScan has 56046 (12.6%) missing valuesMissing
RaceEthnicityCategory has 14057 (3.2%) missing valuesMissing
AgeCategory has 9079 (2.0%) missing valuesMissing
HeightInMeters has 28652 (6.4%) missing valuesMissing
WeightInKilograms has 42078 (9.5%) missing valuesMissing
BMI has 48806 (11.0%) missing valuesMissing
AlcoholDrinkers has 46574 (10.5%) missing valuesMissing
HIVTesting has 66127 (14.9%) missing valuesMissing
FluVaxLast12 has 47121 (10.6%) missing valuesMissing
PneumoVaxEver has 77040 (17.3%) missing valuesMissing
TetanusLast10Tdap has 82516 (18.5%) missing valuesMissing
HighRiskLastYear has 50623 (11.4%) missing valuesMissing
CovidPos has 50764 (11.4%) missing valuesMissing
PhysicalHealthDays has 267819 (60.2%) zerosZeros
MentalHealthDays has 265229 (59.6%) zerosZeros

Reproduction

Analysis started2024-04-06 16:30:36.032901
Analysis finished2024-04-06 16:31:03.164430
Duration27.13 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

State
Text

Distinct54
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:03.276675image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length20
Median length12
Mean length8.350541
Min length4

Characters and Unicode

Total characters3717093
Distinct characters46
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAlabama
2nd rowAlabama
3rd rowAlabama
4th rowAlabama
5th rowAlabama
ValueCountFrequency (%)
new 37524
 
7.0%
washington 26152
 
4.9%
york 17800
 
3.3%
south 17461
 
3.3%
minnesota 16821
 
3.2%
ohio 16487
 
3.1%
maryland 16418
 
3.1%
virginia 15398
 
2.9%
carolina 14542
 
2.7%
texas 14245
 
2.7%
Other values (50) 340315
63.8%
2024-04-06T12:31:03.475407image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 479424
12.9%
i 353695
 
9.5%
n 333499
 
9.0%
o 314446
 
8.5%
s 259106
 
7.0%
e 216472
 
5.8%
r 189964
 
5.1%
t 168967
 
4.5%
h 124376
 
3.3%
l 108158
 
2.9%
Other values (36) 1168986
31.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3099136
83.4%
Uppercase Letter 529926
 
14.3%
Space Separator 88031
 
2.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 479424
15.5%
i 353695
11.4%
n 333499
10.8%
o 314446
10.1%
s 259106
8.4%
e 216472
 
7.0%
r 189964
 
6.1%
t 168967
 
5.5%
h 124376
 
4.0%
l 108158
 
3.5%
Other values (14) 551029
17.8%
Uppercase Letter
ValueCountFrequency (%)
M 88455
16.7%
N 56843
10.7%
C 47880
 
9.0%
W 46551
 
8.8%
I 37175
 
7.0%
O 28018
 
5.3%
A 25865
 
4.9%
V 25740
 
4.9%
T 19511
 
3.7%
D 18801
 
3.5%
Other values (11) 135087
25.5%
Space Separator
ValueCountFrequency (%)
88031
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3629062
97.6%
Common 88031
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 479424
13.2%
i 353695
 
9.7%
n 333499
 
9.2%
o 314446
 
8.7%
s 259106
 
7.1%
e 216472
 
6.0%
r 189964
 
5.2%
t 168967
 
4.7%
h 124376
 
3.4%
l 108158
 
3.0%
Other values (35) 1080955
29.8%
Common
ValueCountFrequency (%)
88031
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3717093
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 479424
12.9%
i 353695
 
9.5%
n 333499
 
9.0%
o 314446
 
8.5%
s 259106
 
7.0%
e 216472
 
5.8%
r 189964
 
5.1%
t 168967
 
4.5%
h 124376
 
3.3%
l 108158
 
2.9%
Other values (36) 1168986
31.4%

Sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
Female
235893 
Male
209239 

Length

Max length6
Median length6
Mean length5.0598789
Min length4

Characters and Unicode

Total characters2252314
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female 235893
53.0%
Male 209239
47.0%

Length

2024-04-06T12:31:03.549475image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:03.603198image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
female 235893
53.0%
male 209239
47.0%

Most occurring characters

ValueCountFrequency (%)
e 681025
30.2%
a 445132
19.8%
l 445132
19.8%
F 235893
 
10.5%
m 235893
 
10.5%
M 209239
 
9.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1807182
80.2%
Uppercase Letter 445132
 
19.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 681025
37.7%
a 445132
24.6%
l 445132
24.6%
m 235893
 
13.1%
Uppercase Letter
ValueCountFrequency (%)
F 235893
53.0%
M 209239
47.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2252314
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 681025
30.2%
a 445132
19.8%
l 445132
19.8%
F 235893
 
10.5%
m 235893
 
10.5%
M 209239
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2252314
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 681025
30.2%
a 445132
19.8%
l 445132
19.8%
F 235893
 
10.5%
m 235893
 
10.5%
M 209239
 
9.3%

GeneralHealth
Categorical

Distinct5
Distinct (%)< 0.1%
Missing1198
Missing (%)0.3%
Memory size3.4 MiB
Very good
148444 
Good
143598 
Excellent
71878 
Fair
60273 
Poor
19741 

Length

Max length9
Median length4
Mean length6.4814725
Min length4

Characters and Unicode

Total characters2877346
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowVery good
2nd rowExcellent
3rd rowVery good
4th rowExcellent
5th rowFair

Common Values

ValueCountFrequency (%)
Very good 148444
33.3%
Good 143598
32.3%
Excellent 71878
16.1%
Fair 60273
13.5%
Poor 19741
 
4.4%
(Missing) 1198
 
0.3%

Length

2024-04-06T12:31:03.650939image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:03.700970image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
good 292042
49.3%
very 148444
25.1%
excellent 71878
 
12.1%
fair 60273
 
10.2%
poor 19741
 
3.3%

Most occurring characters

ValueCountFrequency (%)
o 623566
21.7%
e 292200
10.2%
d 292042
10.1%
r 228458
 
7.9%
V 148444
 
5.2%
y 148444
 
5.2%
148444
 
5.2%
g 148444
 
5.2%
l 143756
 
5.0%
G 143598
 
5.0%
Other values (9) 559950
19.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2284968
79.4%
Uppercase Letter 443934
 
15.4%
Space Separator 148444
 
5.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 623566
27.3%
e 292200
12.8%
d 292042
12.8%
r 228458
 
10.0%
y 148444
 
6.5%
g 148444
 
6.5%
l 143756
 
6.3%
t 71878
 
3.1%
n 71878
 
3.1%
c 71878
 
3.1%
Other values (3) 192424
 
8.4%
Uppercase Letter
ValueCountFrequency (%)
V 148444
33.4%
G 143598
32.3%
E 71878
16.2%
F 60273
13.6%
P 19741
 
4.4%
Space Separator
ValueCountFrequency (%)
148444
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2728902
94.8%
Common 148444
 
5.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 623566
22.9%
e 292200
10.7%
d 292042
10.7%
r 228458
 
8.4%
V 148444
 
5.4%
y 148444
 
5.4%
g 148444
 
5.4%
l 143756
 
5.3%
G 143598
 
5.3%
t 71878
 
2.6%
Other values (8) 488072
17.9%
Common
ValueCountFrequency (%)
148444
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2877346
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 623566
21.7%
e 292200
10.2%
d 292042
10.1%
r 228458
 
7.9%
V 148444
 
5.2%
y 148444
 
5.2%
148444
 
5.2%
g 148444
 
5.2%
l 143756
 
5.0%
G 143598
 
5.0%
Other values (9) 559950
19.5%

PhysicalHealthDays
Real number (ℝ)

MISSING  ZEROS 

Distinct31
Distinct (%)< 0.1%
Missing10927
Missing (%)2.5%
Infinite0
Infinite (%)0.0%
Mean4.3479186
Minimum0
Maximum30
Zeros267819
Zeros (%)60.2%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:03.756730image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile30
Maximum30
Range30
Interquartile range (IQR)3

Descriptive statistics

Standard deviation8.688912
Coefficient of variation (CV)1.9984072
Kurtosis3.4275893
Mean4.3479186
Median Absolute Deviation (MAD)0
Skewness2.1798178
Sum1887888
Variance75.497192
MonotonicityNot monotonic
2024-04-06T12:31:03.813056image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 267819
60.2%
30 33082
 
7.4%
2 25256
 
5.7%
1 17250
 
3.9%
3 15948
 
3.6%
5 15315
 
3.4%
10 10589
 
2.4%
7 9348
 
2.1%
15 8787
 
2.0%
4 8462
 
1.9%
Other values (21) 22349
 
5.0%
(Missing) 10927
 
2.5%
ValueCountFrequency (%)
0 267819
60.2%
1 17250
 
3.9%
2 25256
 
5.7%
3 15948
 
3.6%
4 8462
 
1.9%
5 15315
 
3.4%
6 2538
 
0.6%
7 9348
 
2.1%
8 1761
 
0.4%
9 411
 
0.1%
ValueCountFrequency (%)
30 33082
7.4%
29 365
 
0.1%
28 751
 
0.2%
27 188
 
< 0.1%
26 109
 
< 0.1%
25 2181
 
0.5%
24 125
 
< 0.1%
23 99
 
< 0.1%
22 140
 
< 0.1%
21 1038
 
0.2%

MentalHealthDays
Real number (ℝ)

MISSING  ZEROS 

Distinct31
Distinct (%)< 0.1%
Missing9067
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean4.3826494
Minimum0
Maximum30
Zeros265229
Zeros (%)59.6%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:03.867435image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q35
95-th percentile30
Maximum30
Range30
Interquartile range (IQR)5

Descriptive statistics

Standard deviation8.3874747
Coefficient of variation (CV)1.9137909
Kurtosis3.3592286
Mean4.3826494
Median Absolute Deviation (MAD)0
Skewness2.1232157
Sum1911120
Variance70.349731
MonotonicityNot monotonic
2024-04-06T12:31:03.921779image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0 265229
59.6%
30 26990
 
6.1%
2 23785
 
5.3%
5 19951
 
4.5%
10 15414
 
3.5%
3 15345
 
3.4%
15 14519
 
3.3%
1 14409
 
3.2%
20 9150
 
2.1%
4 7943
 
1.8%
Other values (21) 23330
 
5.2%
(Missing) 9067
 
2.0%
ValueCountFrequency (%)
0 265229
59.6%
1 14409
 
3.2%
2 23785
 
5.3%
3 15345
 
3.4%
4 7943
 
1.8%
5 19951
 
4.5%
6 2305
 
0.5%
7 7844
 
1.8%
8 1749
 
0.4%
9 322
 
0.1%
ValueCountFrequency (%)
30 26990
6.1%
29 502
 
0.1%
28 910
 
0.2%
27 241
 
0.1%
26 106
 
< 0.1%
25 3078
 
0.7%
24 124
 
< 0.1%
23 97
 
< 0.1%
22 193
 
< 0.1%
21 549
 
0.1%

LastCheckupTime
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing8308
Missing (%)1.9%
Memory size3.4 MiB
Within past year (anytime less than 12 months ago)
350944 
Within past 2 years (1 year but less than 2 years ago)
41919 
Within past 5 years (2 years but less than 5 years ago)
 
24882
5 or more years ago
 
19079

Length

Max length55
Median length50
Mean length49.314683
Min length19

Characters and Unicode

Total characters21541837
Distinct characters23
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWithin past year (anytime less than 12 months ago)
2nd rowWithin past year (anytime less than 12 months ago)
3rd rowWithin past year (anytime less than 12 months ago)
4th rowWithin past year (anytime less than 12 months ago)
5th rowWithin past year (anytime less than 12 months ago)

Common Values

ValueCountFrequency (%)
Within past year (anytime less than 12 months ago) 350944
78.8%
Within past 2 years (1 year but less than 2 years ago) 41919
 
9.4%
Within past 5 years (2 years but less than 5 years ago) 24882
 
5.6%
5 or more years ago 19079
 
4.3%
(Missing) 8308
 
1.9%

Length

2024-04-06T12:31:03.979021image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:04.029684image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
ago 436824
10.8%
within 417745
10.3%
past 417745
10.3%
less 417745
10.3%
than 417745
10.3%
year 392863
9.7%
anytime 350944
8.7%
12 350944
8.7%
months 350944
8.7%
years 177563
4.4%
Other values (6) 324441
8.0%

Most occurring characters

ValueCountFrequency (%)
3618679
16.8%
a 2193684
10.2%
t 2021924
9.4%
s 1781742
 
8.3%
n 1537378
 
7.1%
e 1358194
 
6.3%
h 1186434
 
5.5%
i 1186434
 
5.5%
y 921370
 
4.3%
o 825926
 
3.8%
Other values (13) 4910072
22.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 15748553
73.1%
Space Separator 3618679
 
16.8%
Decimal Number 921370
 
4.3%
Close Punctuation 417745
 
1.9%
Uppercase Letter 417745
 
1.9%
Open Punctuation 417745
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2193684
13.9%
t 2021924
12.8%
s 1781742
11.3%
n 1537378
9.8%
e 1358194
8.6%
h 1186434
7.5%
i 1186434
7.5%
y 921370
5.9%
o 825926
 
5.2%
m 720967
 
4.6%
Other values (6) 2014500
12.8%
Decimal Number
ValueCountFrequency (%)
2 459664
49.9%
1 392863
42.6%
5 68843
 
7.5%
Space Separator
ValueCountFrequency (%)
3618679
100.0%
Close Punctuation
ValueCountFrequency (%)
) 417745
100.0%
Uppercase Letter
ValueCountFrequency (%)
W 417745
100.0%
Open Punctuation
ValueCountFrequency (%)
( 417745
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 16166298
75.0%
Common 5375539
 
25.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2193684
13.6%
t 2021924
12.5%
s 1781742
11.0%
n 1537378
9.5%
e 1358194
8.4%
h 1186434
7.3%
i 1186434
7.3%
y 921370
 
5.7%
o 825926
 
5.1%
m 720967
 
4.5%
Other values (7) 2432245
15.0%
Common
ValueCountFrequency (%)
3618679
67.3%
2 459664
 
8.6%
) 417745
 
7.8%
( 417745
 
7.8%
1 392863
 
7.3%
5 68843
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21541837
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3618679
16.8%
a 2193684
10.2%
t 2021924
9.4%
s 1781742
 
8.3%
n 1537378
 
7.1%
e 1358194
 
6.3%
h 1186434
 
5.5%
i 1186434
 
5.5%
y 921370
 
4.3%
o 825926
 
3.8%
Other values (13) 4910072
22.8%
Distinct2
Distinct (%)< 0.1%
Missing1093
Missing (%)0.2%
Memory size869.5 KiB
True
337559 
False
106480 
(Missing)
 
1093
ValueCountFrequency (%)
True 337559
75.8%
False 106480
 
23.9%
(Missing) 1093
 
0.2%
2024-04-06T12:31:04.161898image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

SleepHours
Real number (ℝ)

MISSING 

Distinct24
Distinct (%)< 0.1%
Missing5453
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean7.0229827
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:04.211062image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q16
median7
Q38
95-th percentile9
Maximum24
Range23
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.502425
Coefficient of variation (CV)0.21392976
Kurtosis8.7411699
Mean7.0229827
Median Absolute Deviation (MAD)1
Skewness0.7646025
Sum3087858
Variance2.2572809
MonotonicityNot monotonic
2024-04-06T12:31:04.263971image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
7 132927
29.9%
8 125442
28.2%
6 95880
21.5%
5 30122
 
6.8%
9 21210
 
4.8%
4 12433
 
2.8%
10 10459
 
2.3%
3 3260
 
0.7%
12 3004
 
0.7%
2 1549
 
0.3%
Other values (14) 3393
 
0.8%
(Missing) 5453
 
1.2%
ValueCountFrequency (%)
1 1154
 
0.3%
2 1549
 
0.3%
3 3260
 
0.7%
4 12433
 
2.8%
5 30122
 
6.8%
6 95880
21.5%
7 132927
29.9%
8 125442
28.2%
9 21210
 
4.8%
10 10459
 
2.3%
ValueCountFrequency (%)
24 52
 
< 0.1%
23 18
 
< 0.1%
22 19
 
< 0.1%
21 4
 
< 0.1%
20 143
< 0.1%
19 16
 
< 0.1%
18 168
< 0.1%
17 27
 
< 0.1%
16 329
0.1%
15 317
0.1%

RemovedTeeth
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing11360
Missing (%)2.6%
Memory size3.4 MiB
None of them
233455 
1 to 5
129294 
6 or more, but not all
45570 
All
25453 

Length

Max length22
Median length12
Mean length10.734033
Min length3

Characters and Unicode

Total characters4656123
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNone of them
2nd rowNone of them
3rd row1 to 5
4th row6 or more, but not all
5th rowNone of them

Common Values

ValueCountFrequency (%)
None of them 233455
52.4%
1 to 5 129294
29.0%
6 or more, but not all 45570
 
10.2%
All 25453
 
5.7%
(Missing) 11360
 
2.6%

Length

2024-04-06T12:31:04.319686image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:04.369263image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
none 233455
16.8%
of 233455
16.8%
them 233455
16.8%
1 129294
9.3%
to 129294
9.3%
5 129294
9.3%
all 71023
 
5.1%
6 45570
 
3.3%
or 45570
 
3.3%
more 45570
 
3.3%
Other values (2) 91140
 
6.6%

Most occurring characters

ValueCountFrequency (%)
953348
20.5%
o 732914
15.7%
e 512480
11.0%
t 453889
9.7%
n 279025
 
6.0%
m 279025
 
6.0%
N 233455
 
5.0%
f 233455
 
5.0%
h 233455
 
5.0%
l 142046
 
3.1%
Other values (9) 603031
13.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3094139
66.5%
Space Separator 953348
 
20.5%
Decimal Number 304158
 
6.5%
Uppercase Letter 258908
 
5.6%
Other Punctuation 45570
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 732914
23.7%
e 512480
16.6%
t 453889
14.7%
n 279025
 
9.0%
m 279025
 
9.0%
f 233455
 
7.5%
h 233455
 
7.5%
l 142046
 
4.6%
r 91140
 
2.9%
b 45570
 
1.5%
Other values (2) 91140
 
2.9%
Decimal Number
ValueCountFrequency (%)
1 129294
42.5%
5 129294
42.5%
6 45570
 
15.0%
Uppercase Letter
ValueCountFrequency (%)
N 233455
90.2%
A 25453
 
9.8%
Space Separator
ValueCountFrequency (%)
953348
100.0%
Other Punctuation
ValueCountFrequency (%)
, 45570
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3353047
72.0%
Common 1303076
 
28.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 732914
21.9%
e 512480
15.3%
t 453889
13.5%
n 279025
 
8.3%
m 279025
 
8.3%
N 233455
 
7.0%
f 233455
 
7.0%
h 233455
 
7.0%
l 142046
 
4.2%
r 91140
 
2.7%
Other values (4) 162163
 
4.8%
Common
ValueCountFrequency (%)
953348
73.2%
1 129294
 
9.9%
5 129294
 
9.9%
6 45570
 
3.5%
, 45570
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4656123
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
953348
20.5%
o 732914
15.7%
e 512480
11.0%
t 453889
9.7%
n 279025
 
6.0%
m 279025
 
6.0%
N 233455
 
5.0%
f 233455
 
5.0%
h 233455
 
5.0%
l 142046
 
3.1%
Other values (9) 603031
13.0%

HadHeartAttack
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing3065
Missing (%)0.7%
Memory size869.5 KiB
False
416959 
True
 
25108
(Missing)
 
3065
ValueCountFrequency (%)
False 416959
93.7%
True 25108
 
5.6%
(Missing) 3065
 
0.7%
2024-04-06T12:31:04.418647image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadAngina
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing4405
Missing (%)1.0%
Memory size869.5 KiB
False
414176 
True
 
26551
(Missing)
 
4405
ValueCountFrequency (%)
False 414176
93.0%
True 26551
 
6.0%
(Missing) 4405
 
1.0%
2024-04-06T12:31:04.462202image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadStroke
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing1557
Missing (%)0.3%
Memory size869.5 KiB
False
424336 
True
 
19239
(Missing)
 
1557
ValueCountFrequency (%)
False 424336
95.3%
True 19239
 
4.3%
(Missing) 1557
 
0.3%
2024-04-06T12:31:04.504290image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadAsthma
Boolean

Distinct2
Distinct (%)< 0.1%
Missing1773
Missing (%)0.4%
Memory size869.5 KiB
False
376665 
True
66694 
(Missing)
 
1773
ValueCountFrequency (%)
False 376665
84.6%
True 66694
 
15.0%
(Missing) 1773
 
0.4%
2024-04-06T12:31:04.544838image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadSkinCancer
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing3143
Missing (%)0.7%
Memory size869.5 KiB
False
406504 
True
 
35485
(Missing)
 
3143
ValueCountFrequency (%)
False 406504
91.3%
True 35485
 
8.0%
(Missing) 3143
 
0.7%
2024-04-06T12:31:04.589211image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadCOPD
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing2219
Missing (%)0.5%
Memory size869.5 KiB
False
407257 
True
 
35656
(Missing)
 
2219
ValueCountFrequency (%)
False 407257
91.5%
True 35656
 
8.0%
(Missing) 2219
 
0.5%
2024-04-06T12:31:04.631232image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing2812
Missing (%)0.6%
Memory size869.5 KiB
False
350910 
True
91410 
(Missing)
 
2812
ValueCountFrequency (%)
False 350910
78.8%
True 91410
 
20.5%
(Missing) 2812
 
0.6%
2024-04-06T12:31:04.675852image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadKidneyDisease
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing1926
Missing (%)0.4%
Memory size869.5 KiB
False
422891 
True
 
20315
(Missing)
 
1926
ValueCountFrequency (%)
False 422891
95.0%
True 20315
 
4.6%
(Missing) 1926
 
0.4%
2024-04-06T12:31:04.718030image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing2633
Missing (%)0.6%
Memory size869.5 KiB
False
291351 
True
151148 
(Missing)
 
2633
ValueCountFrequency (%)
False 291351
65.5%
True 151148
34.0%
(Missing) 2633
 
0.6%
2024-04-06T12:31:04.760159image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HadDiabetes
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing1087
Missing (%)0.2%
Memory size3.4 MiB
No
368722 
Yes
61158 
No, pre-diabetes or borderline diabetes
 
10329
Yes, but only during pregnancy (female)
 
3836

Length

Max length39
Median length2
Mean length3.3180263
Min length2

Characters and Unicode

Total characters1473353
Distinct characters25
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes
2nd rowNo
3rd rowNo
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 368722
82.8%
Yes 61158
 
13.7%
No, pre-diabetes or borderline diabetes 10329
 
2.3%
Yes, but only during pregnancy (female) 3836
 
0.9%
(Missing) 1087
 
0.2%

Length

2024-04-06T12:31:04.809104image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:04.858804image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
no 379051
75.1%
yes 64994
 
12.9%
pre-diabetes 10329
 
2.0%
or 10329
 
2.0%
borderline 10329
 
2.0%
diabetes 10329
 
2.0%
but 3836
 
0.8%
only 3836
 
0.8%
during 3836
 
0.8%
pregnancy 3836
 
0.8%

Most occurring characters

ValueCountFrequency (%)
o 403545
27.4%
N 379051
25.7%
e 148805
 
10.1%
s 85652
 
5.8%
Y 64994
 
4.4%
60496
 
4.1%
r 48988
 
3.3%
d 34823
 
2.4%
b 34823
 
2.4%
i 34823
 
2.4%
Other values (15) 177353
12.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 936646
63.6%
Uppercase Letter 444045
30.1%
Space Separator 60496
 
4.1%
Other Punctuation 14165
 
1.0%
Dash Punctuation 10329
 
0.7%
Open Punctuation 3836
 
0.3%
Close Punctuation 3836
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 403545
43.1%
e 148805
 
15.9%
s 85652
 
9.1%
r 48988
 
5.2%
d 34823
 
3.7%
b 34823
 
3.7%
i 34823
 
3.7%
a 28330
 
3.0%
n 25673
 
2.7%
t 24494
 
2.6%
Other values (8) 66690
 
7.1%
Uppercase Letter
ValueCountFrequency (%)
N 379051
85.4%
Y 64994
 
14.6%
Space Separator
ValueCountFrequency (%)
60496
100.0%
Other Punctuation
ValueCountFrequency (%)
, 14165
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 10329
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3836
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3836
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1380691
93.7%
Common 92662
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 403545
29.2%
N 379051
27.5%
e 148805
 
10.8%
s 85652
 
6.2%
Y 64994
 
4.7%
r 48988
 
3.5%
d 34823
 
2.5%
b 34823
 
2.5%
i 34823
 
2.5%
a 28330
 
2.1%
Other values (10) 116857
 
8.5%
Common
ValueCountFrequency (%)
60496
65.3%
, 14165
 
15.3%
- 10329
 
11.1%
( 3836
 
4.1%
) 3836
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1473353
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 403545
27.4%
N 379051
25.7%
e 148805
 
10.1%
s 85652
 
5.8%
Y 64994
 
4.4%
60496
 
4.1%
r 48988
 
3.3%
d 34823
 
2.4%
b 34823
 
2.4%
i 34823
 
2.4%
Other values (15) 177353
12.0%

DeafOrHardOfHearing
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing20647
Missing (%)4.6%
Memory size869.5 KiB
False
385539 
True
38946 
(Missing)
 
20647
ValueCountFrequency (%)
False 385539
86.6%
True 38946
 
8.7%
(Missing) 20647
 
4.6%
2024-04-06T12:31:04.907046image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

BlindOrVisionDifficulty
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing21564
Missing (%)4.8%
Memory size869.5 KiB
False
399910 
True
 
23658
(Missing)
 
21564
ValueCountFrequency (%)
False 399910
89.8%
True 23658
 
5.3%
(Missing) 21564
 
4.8%
2024-04-06T12:31:04.950593image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

DifficultyConcentrating
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing24240
Missing (%)5.4%
Memory size869.5 KiB
False
370792 
True
50100 
(Missing)
 
24240
ValueCountFrequency (%)
False 370792
83.3%
True 50100
 
11.3%
(Missing) 24240
 
5.4%
2024-04-06T12:31:04.991996image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

DifficultyWalking
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing24012
Missing (%)5.4%
Memory size869.5 KiB
False
353039 
True
68081 
(Missing)
 
24012
ValueCountFrequency (%)
False 353039
79.3%
True 68081
 
15.3%
(Missing) 24012
 
5.4%
2024-04-06T12:31:05.035681image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

DifficultyDressingBathing
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing23915
Missing (%)5.4%
Memory size869.5 KiB
False
404404 
True
 
16813
(Missing)
 
23915
ValueCountFrequency (%)
False 404404
90.9%
True 16813
 
3.8%
(Missing) 23915
 
5.4%
2024-04-06T12:31:05.077336image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

DifficultyErrands
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing25656
Missing (%)5.8%
Memory size869.5 KiB
False
387029 
True
 
32447
(Missing)
 
25656
ValueCountFrequency (%)
False 387029
86.9%
True 32447
 
7.3%
(Missing) 25656
 
5.8%
2024-04-06T12:31:05.117489image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

SmokerStatus
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing35462
Missing (%)8.0%
Memory size3.4 MiB
Never smoked
245955 
Former smoker
113774 
Current smoker - now smokes every day
36003 
Current smoker - now smokes some days
 
13938

Length

Max length37
Median length12
Mean length15.325357
Min length12

Characters and Unicode

Total characters6278339
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever smoked
2nd rowNever smoked
3rd rowNever smoked
4th rowCurrent smoker - now smokes some days
5th rowNever smoked

Common Values

ValueCountFrequency (%)
Never smoked 245955
55.3%
Former smoker 113774
25.6%
Current smoker - now smokes every day 36003
 
8.1%
Current smoker - now smokes some days 13938
 
3.1%
(Missing) 35462
 
8.0%

Length

2024-04-06T12:31:05.165716image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:05.214802image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
never 245955
23.0%
smoked 245955
23.0%
smoker 163715
15.3%
former 113774
10.6%
current 49941
 
4.7%
49941
 
4.7%
now 49941
 
4.7%
smokes 49941
 
4.7%
every 36003
 
3.4%
day 36003
 
3.4%
Other values (2) 27876
 
2.6%

Most occurring characters

ValueCountFrequency (%)
e 1201180
19.1%
r 773103
12.3%
659375
10.5%
o 637264
10.2%
m 587323
9.4%
s 537428
8.6%
k 459611
 
7.3%
d 295896
 
4.7%
v 281958
 
4.5%
N 245955
 
3.9%
Other values (9) 599246
9.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5159353
82.2%
Space Separator 659375
 
10.5%
Uppercase Letter 409670
 
6.5%
Dash Punctuation 49941
 
0.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1201180
23.3%
r 773103
15.0%
o 637264
12.4%
m 587323
11.4%
s 537428
10.4%
k 459611
 
8.9%
d 295896
 
5.7%
v 281958
 
5.5%
n 99882
 
1.9%
y 85944
 
1.7%
Other values (4) 199764
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
N 245955
60.0%
F 113774
27.8%
C 49941
 
12.2%
Space Separator
ValueCountFrequency (%)
659375
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 49941
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 5569023
88.7%
Common 709316
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1201180
21.6%
r 773103
13.9%
o 637264
11.4%
m 587323
10.5%
s 537428
9.7%
k 459611
 
8.3%
d 295896
 
5.3%
v 281958
 
5.1%
N 245955
 
4.4%
F 113774
 
2.0%
Other values (7) 435531
 
7.8%
Common
ValueCountFrequency (%)
659375
93.0%
- 49941
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6278339
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 1201180
19.1%
r 773103
12.3%
659375
10.5%
o 637264
10.2%
m 587323
9.4%
s 537428
8.6%
k 459611
 
7.3%
d 295896
 
4.7%
v 281958
 
4.5%
N 245955
 
3.9%
Other values (9) 599246
9.5%

ECigaretteUsage
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing35660
Missing (%)8.0%
Memory size3.4 MiB
Never used e-cigarettes in my entire life
311988 
Not at all (right now)
75368 
Use them some days
 
11734
Use them every day
 
10382

Length

Max length41
Median length41
Mean length36.260579
Min length18

Characters and Unicode

Total characters14847692
Distinct characters25
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot at all (right now)
2nd rowNever used e-cigarettes in my entire life
3rd rowNever used e-cigarettes in my entire life
4th rowNever used e-cigarettes in my entire life
5th rowNever used e-cigarettes in my entire life

Common Values

ValueCountFrequency (%)
Never used e-cigarettes in my entire life 311988
70.1%
Not at all (right now) 75368
 
16.9%
Use them some days 11734
 
2.6%
Use them every day 10382
 
2.3%
(Missing) 35660
 
8.0%

Length

2024-04-06T12:31:05.269840image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:05.318571image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
never 311988
11.8%
used 311988
11.8%
e-cigarettes 311988
11.8%
in 311988
11.8%
my 311988
11.8%
entire 311988
11.8%
life 311988
11.8%
now 75368
 
2.8%
right 75368
 
2.8%
all 75368
 
2.8%
Other values (8) 239200
9.0%

Most occurring characters

ValueCountFrequency (%)
e 2884622
19.4%
2239748
15.1%
i 1323320
 
8.9%
t 1184184
 
8.0%
r 1021714
 
6.9%
n 699344
 
4.7%
s 669560
 
4.5%
a 484840
 
3.3%
l 462724
 
3.1%
g 387356
 
2.6%
Other values (15) 3490280
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11735748
79.0%
Space Separator 2239748
 
15.1%
Uppercase Letter 409472
 
2.8%
Dash Punctuation 311988
 
2.1%
Open Punctuation 75368
 
0.5%
Close Punctuation 75368
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2884622
24.6%
i 1323320
11.3%
t 1184184
10.1%
r 1021714
 
8.7%
n 699344
 
6.0%
s 669560
 
5.7%
a 484840
 
4.1%
l 462724
 
3.9%
g 387356
 
3.3%
m 345838
 
2.9%
Other values (9) 2272246
19.4%
Uppercase Letter
ValueCountFrequency (%)
N 387356
94.6%
U 22116
 
5.4%
Space Separator
ValueCountFrequency (%)
2239748
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 311988
100.0%
Open Punctuation
ValueCountFrequency (%)
( 75368
100.0%
Close Punctuation
ValueCountFrequency (%)
) 75368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12145220
81.8%
Common 2702472
 
18.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2884622
23.8%
i 1323320
10.9%
t 1184184
9.8%
r 1021714
 
8.4%
n 699344
 
5.8%
s 669560
 
5.5%
a 484840
 
4.0%
l 462724
 
3.8%
g 387356
 
3.2%
N 387356
 
3.2%
Other values (11) 2640200
21.7%
Common
ValueCountFrequency (%)
2239748
82.9%
- 311988
 
11.5%
( 75368
 
2.8%
) 75368
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14847692
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 2884622
19.4%
2239748
15.1%
i 1323320
 
8.9%
t 1184184
 
8.0%
r 1021714
 
6.9%
n 699344
 
4.7%
s 669560
 
4.5%
a 484840
 
3.3%
l 462724
 
3.1%
g 387356
 
2.6%
Other values (15) 3490280
23.5%

ChestScan
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing56046
Missing (%)12.6%
Memory size869.5 KiB
False
223221 
True
165865 
(Missing)
56046 
ValueCountFrequency (%)
False 223221
50.1%
True 165865
37.3%
(Missing) 56046
 
12.6%
2024-04-06T12:31:05.366571image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

RaceEthnicityCategory
Categorical

MISSING 

Distinct5
Distinct (%)< 0.1%
Missing14057
Missing (%)3.2%
Memory size3.4 MiB
White only, Non-Hispanic
320421 
Hispanic
42917 
Black only, Non-Hispanic
35446 
Other race only, Non-Hispanic
 
22713
Multiracial, Non-Hispanic
 
9578

Length

Max length29
Median length24
Mean length22.692736
Min length8

Characters and Unicode

Total characters9782271
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite only, Non-Hispanic
2nd rowWhite only, Non-Hispanic
3rd rowWhite only, Non-Hispanic
4th rowWhite only, Non-Hispanic
5th rowWhite only, Non-Hispanic

Common Values

ValueCountFrequency (%)
White only, Non-Hispanic 320421
72.0%
Hispanic 42917
 
9.6%
Black only, Non-Hispanic 35446
 
8.0%
Other race only, Non-Hispanic 22713
 
5.1%
Multiracial, Non-Hispanic 9578
 
2.2%
(Missing) 14057
 
3.2%

Length

2024-04-06T12:31:05.412577image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:05.460707image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
non-hispanic 388158
31.8%
only 378580
31.0%
white 320421
26.3%
hispanic 42917
 
3.5%
black 35446
 
2.9%
other 22713
 
1.9%
race 22713
 
1.9%
multiracial 9578
 
0.8%

Most occurring characters

ValueCountFrequency (%)
i 1201727
 
12.3%
n 1197813
 
12.2%
789451
 
8.1%
o 766738
 
7.8%
a 508390
 
5.2%
c 498812
 
5.1%
l 433182
 
4.4%
H 431075
 
4.4%
s 431075
 
4.4%
p 431075
 
4.4%
Other values (14) 3092933
31.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7009113
71.7%
Uppercase Letter 1207391
 
12.3%
Space Separator 789451
 
8.1%
Other Punctuation 388158
 
4.0%
Dash Punctuation 388158
 
4.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 1201727
17.1%
n 1197813
17.1%
o 766738
10.9%
a 508390
7.3%
c 498812
7.1%
l 433182
 
6.2%
s 431075
 
6.2%
p 431075
 
6.2%
y 378580
 
5.4%
e 365847
 
5.2%
Other values (5) 795874
11.4%
Uppercase Letter
ValueCountFrequency (%)
H 431075
35.7%
N 388158
32.1%
W 320421
26.5%
B 35446
 
2.9%
O 22713
 
1.9%
M 9578
 
0.8%
Space Separator
ValueCountFrequency (%)
789451
100.0%
Other Punctuation
ValueCountFrequency (%)
, 388158
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 388158
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8216504
84.0%
Common 1565767
 
16.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 1201727
14.6%
n 1197813
14.6%
o 766738
 
9.3%
a 508390
 
6.2%
c 498812
 
6.1%
l 433182
 
5.3%
H 431075
 
5.2%
s 431075
 
5.2%
p 431075
 
5.2%
N 388158
 
4.7%
Other values (11) 1928459
23.5%
Common
ValueCountFrequency (%)
789451
50.4%
, 388158
24.8%
- 388158
24.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9782271
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 1201727
 
12.3%
n 1197813
 
12.2%
789451
 
8.1%
o 766738
 
7.8%
a 508390
 
5.2%
c 498812
 
5.1%
l 433182
 
4.4%
H 431075
 
4.4%
s 431075
 
4.4%
p 431075
 
4.4%
Other values (14) 3092933
31.6%

AgeCategory
Categorical

MISSING 

Distinct13
Distinct (%)< 0.1%
Missing9079
Missing (%)2.0%
Memory size3.4 MiB
Age 65 to 69
47099 
Age 60 to 64
44511 
Age 70 to 74
43472 
Age 55 to 59
36821 
Age 80 or older
36251 
Other values (8)
227899 

Length

Max length15
Median length12
Mean length12.249403
Min length12

Characters and Unicode

Total characters5341389
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAge 80 or older
2nd rowAge 80 or older
3rd rowAge 55 to 59
4th rowAge 40 to 44
5th rowAge 80 or older

Common Values

ValueCountFrequency (%)
Age 65 to 69 47099
10.6%
Age 60 to 64 44511
10.0%
Age 70 to 74 43472
9.8%
Age 55 to 59 36821
8.3%
Age 80 or older 36251
8.1%
Age 50 to 54 33644
7.6%
Age 75 to 79 32518
7.3%
Age 40 to 44 29942
6.7%
Age 45 to 49 28531
 
6.4%
Age 35 to 39 28526
 
6.4%
Other values (3) 74738
16.8%

Length

2024-04-06T12:31:05.520376image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
age 436053
25.0%
to 399802
22.9%
65 47099
 
2.7%
69 47099
 
2.7%
60 44511
 
2.6%
64 44511
 
2.6%
70 43472
 
2.5%
74 43472
 
2.5%
55 36821
 
2.1%
59 36821
 
2.1%
Other values (19) 564551
32.4%

Most occurring characters

ValueCountFrequency (%)
1308159
24.5%
e 472304
 
8.8%
o 472304
 
8.8%
A 436053
 
8.2%
g 436053
 
8.2%
t 399802
 
7.5%
5 336415
 
6.3%
4 321263
 
6.0%
0 213627
 
4.0%
9 195485
 
3.7%
Other values (9) 749924
14.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1925467
36.0%
Decimal Number 1671710
31.3%
Space Separator 1308159
24.5%
Uppercase Letter 436053
 
8.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 336415
20.1%
4 321263
19.2%
0 213627
12.8%
9 195485
11.7%
6 183220
11.0%
7 151980
9.1%
3 108666
 
6.5%
2 70921
 
4.2%
8 63192
 
3.8%
1 26941
 
1.6%
Lowercase Letter
ValueCountFrequency (%)
e 472304
24.5%
o 472304
24.5%
g 436053
22.6%
t 399802
20.8%
r 72502
 
3.8%
l 36251
 
1.9%
d 36251
 
1.9%
Space Separator
ValueCountFrequency (%)
1308159
100.0%
Uppercase Letter
ValueCountFrequency (%)
A 436053
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2979869
55.8%
Latin 2361520
44.2%

Most frequent character per script

Common
ValueCountFrequency (%)
1308159
43.9%
5 336415
 
11.3%
4 321263
 
10.8%
0 213627
 
7.2%
9 195485
 
6.6%
6 183220
 
6.1%
7 151980
 
5.1%
3 108666
 
3.6%
2 70921
 
2.4%
8 63192
 
2.1%
Latin
ValueCountFrequency (%)
e 472304
20.0%
o 472304
20.0%
A 436053
18.5%
g 436053
18.5%
t 399802
16.9%
r 72502
 
3.1%
l 36251
 
1.5%
d 36251
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5341389
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1308159
24.5%
e 472304
 
8.8%
o 472304
 
8.8%
A 436053
 
8.2%
g 436053
 
8.2%
t 399802
 
7.5%
5 336415
 
6.3%
4 321263
 
6.0%
0 213627
 
4.0%
9 195485
 
3.7%
Other values (9) 749924
14.0%

HeightInMeters
Real number (ℝ)

MISSING 

Distinct109
Distinct (%)< 0.1%
Missing28652
Missing (%)6.4%
Infinite0
Infinite (%)0.0%
Mean1.7026906
Minimum0.91
Maximum2.41
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:05.578648image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0.91
5-th percentile1.52
Q11.63
median1.7
Q31.78
95-th percentile1.88
Maximum2.41
Range1.5
Interquartile range (IQR)0.15

Descriptive statistics

Standard deviation0.1071775
Coefficient of variation (CV)0.062945964
Kurtosis0.18229935
Mean1.7026906
Median Absolute Deviation (MAD)0.08
Skewness0.028899535
Sum709136.57
Variance0.011487016
MonotonicityNot monotonic
2024-04-06T12:31:05.644795image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.68 36782
 
8.3%
1.63 35622
 
8.0%
1.7 34038
 
7.6%
1.65 32785
 
7.4%
1.78 32038
 
7.2%
1.73 30910
 
6.9%
1.75 29157
 
6.6%
1.6 28296
 
6.4%
1.83 28294
 
6.4%
1.57 26944
 
6.1%
Other values (99) 101614
22.8%
(Missing) 28652
 
6.4%
ValueCountFrequency (%)
0.91 24
< 0.1%
0.92 1
 
< 0.1%
0.95 1
 
< 0.1%
0.97 4
 
< 0.1%
0.99 1
 
< 0.1%
1 4
 
< 0.1%
1.02 3
 
< 0.1%
1.03 3
 
< 0.1%
1.04 18
< 0.1%
1.05 29
< 0.1%
ValueCountFrequency (%)
2.41 5
 
< 0.1%
2.36 1
 
< 0.1%
2.34 4
 
< 0.1%
2.29 5
 
< 0.1%
2.26 11
 
< 0.1%
2.24 2
 
< 0.1%
2.21 9
 
< 0.1%
2.18 10
 
< 0.1%
2.16 10
 
< 0.1%
2.13 29
< 0.1%

WeightInKilograms
Real number (ℝ)

MISSING 

Distinct599
Distinct (%)0.1%
Missing42078
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean83.07447
Minimum22.68
Maximum292.57
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:05.712550image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum22.68
5-th percentile54.43
Q168.04
median80.74
Q395.25
95-th percentile122.47
Maximum292.57
Range269.89
Interquartile range (IQR)27.21

Descriptive statistics

Standard deviation21.448173
Coefficient of variation (CV)0.25818007
Kurtosis2.7389723
Mean83.07447
Median Absolute Deviation (MAD)12.7
Skewness1.0756118
Sum33483498
Variance460.02411
MonotonicityNot monotonic
2024-04-06T12:31:05.775361image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90.72 21311
 
4.8%
81.65 19709
 
4.4%
68.04 17595
 
4.0%
72.57 17177
 
3.9%
77.11 15979
 
3.6%
86.18 14202
 
3.2%
63.5 12924
 
2.9%
79.38 11722
 
2.6%
99.79 10890
 
2.4%
74.84 10809
 
2.4%
Other values (589) 250736
56.3%
(Missing) 42078
 
9.5%
ValueCountFrequency (%)
22.68 10
< 0.1%
23 1
 
< 0.1%
23.13 1
 
< 0.1%
23.59 2
 
< 0.1%
24 1
 
< 0.1%
24.04 3
 
< 0.1%
24.49 2
 
< 0.1%
24.95 6
< 0.1%
25.4 3
 
< 0.1%
25.85 3
 
< 0.1%
ValueCountFrequency (%)
292.57 1
< 0.1%
290.3 2
< 0.1%
285 1
< 0.1%
284.86 1
< 0.1%
281.68 1
< 0.1%
281 1
< 0.1%
280.32 1
< 0.1%
280 1
< 0.1%
278.96 1
< 0.1%
276.24 1
< 0.1%

BMI
Real number (ℝ)

MISSING 

Distinct3985
Distinct (%)1.0%
Missing48806
Missing (%)11.0%
Infinite0
Infinite (%)0.0%
Mean28.529842
Minimum12.02
Maximum99.64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.4 MiB
2024-04-06T12:31:05.836886image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum12.02
5-th percentile20.12
Q124.13
median27.44
Q331.75
95-th percentile40.69
Maximum99.64
Range87.62
Interquartile range (IQR)7.62

Descriptive statistics

Standard deviation6.5548887
Coefficient of variation (CV)0.22975552
Kurtosis4.4283868
Mean28.529842
Median Absolute Deviation (MAD)3.73
Skewness1.3877393
Sum11307118
Variance42.966565
MonotonicityNot monotonic
2024-04-06T12:31:05.897157image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26.63 4262
 
1.0%
27.46 3277
 
0.7%
24.41 3188
 
0.7%
27.44 3128
 
0.7%
27.12 3123
 
0.7%
25.1 2726
 
0.6%
32.28 2417
 
0.5%
29.53 2334
 
0.5%
25.84 2331
 
0.5%
29.29 2308
 
0.5%
Other values (3975) 367232
82.5%
(Missing) 48806
 
11.0%
ValueCountFrequency (%)
12.02 1
 
< 0.1%
12.05 1
 
< 0.1%
12.06 1
 
< 0.1%
12.11 3
< 0.1%
12.15 1
 
< 0.1%
12.16 5
< 0.1%
12.19 1
 
< 0.1%
12.2 1
 
< 0.1%
12.21 3
< 0.1%
12.24 1
 
< 0.1%
ValueCountFrequency (%)
99.64 1
 
< 0.1%
99.34 1
 
< 0.1%
97.65 5
< 0.1%
97.43 1
 
< 0.1%
96.2 1
 
< 0.1%
95.66 2
 
< 0.1%
94.66 1
 
< 0.1%
93.88 2
 
< 0.1%
93.51 1
 
< 0.1%
93.41 1
 
< 0.1%

AlcoholDrinkers
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing46574
Missing (%)10.5%
Memory size869.5 KiB
True
210891 
False
187667 
(Missing)
46574 
ValueCountFrequency (%)
True 210891
47.4%
False 187667
42.2%
(Missing) 46574
 
10.5%
2024-04-06T12:31:05.945404image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

HIVTesting
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing66127
Missing (%)14.9%
Memory size869.5 KiB
False
249919 
True
129086 
(Missing)
66127 
ValueCountFrequency (%)
False 249919
56.1%
True 129086
29.0%
(Missing) 66127
 
14.9%
2024-04-06T12:31:06.063116image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

FluVaxLast12
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing47121
Missing (%)10.6%
Memory size869.5 KiB
True
209256 
False
188755 
(Missing)
47121 
ValueCountFrequency (%)
True 209256
47.0%
False 188755
42.4%
(Missing) 47121
 
10.6%
2024-04-06T12:31:06.104606image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

PneumoVaxEver
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing77040
Missing (%)17.3%
Memory size869.5 KiB
False
215604 
True
152488 
(Missing)
77040 
ValueCountFrequency (%)
False 215604
48.4%
True 152488
34.3%
(Missing) 77040
 
17.3%
2024-04-06T12:31:06.147063image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

TetanusLast10Tdap
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing82516
Missing (%)18.5%
Memory size3.4 MiB
No, did not receive any tetanus shot in the past 10 years
121493 
Yes, received tetanus shot but not sure what type
113725 
Yes, received Tdap
99943 
Yes, received tetanus shot, but not Tdap
27455 

Length

Max length57
Median length49
Mean length42.454828
Min length18

Characters and Unicode

Total characters15394800
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes, received tetanus shot but not sure what type
2nd rowNo, did not receive any tetanus shot in the past 10 years
3rd rowNo, did not receive any tetanus shot in the past 10 years
4th rowNo, did not receive any tetanus shot in the past 10 years
5th rowNo, did not receive any tetanus shot in the past 10 years

Common Values

ValueCountFrequency (%)
No, did not receive any tetanus shot in the past 10 years 121493
27.3%
Yes, received tetanus shot but not sure what type 113725
25.5%
Yes, received Tdap 99943
22.5%
Yes, received tetanus shot, but not Tdap 27455
 
6.2%
(Missing) 82516
18.5%

Length

2024-04-06T12:31:06.197264image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:06.247460image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
not 262673
 
8.8%
tetanus 262673
 
8.8%
shot 262673
 
8.8%
received 241123
 
8.1%
yes 241123
 
8.1%
but 141180
 
4.7%
tdap 127398
 
4.3%
10 121493
 
4.1%
years 121493
 
4.1%
no 121493
 
4.1%
Other values (9) 1070133
36.0%

Most occurring characters

ValueCountFrequency (%)
2610839
17.0%
e 2062080
13.4%
t 1662308
10.8%
s 1123180
 
7.3%
a 868275
 
5.6%
n 768332
 
5.0%
o 646839
 
4.2%
d 611507
 
4.0%
i 605602
 
3.9%
r 597834
 
3.9%
Other values (14) 3838004
24.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11660890
75.7%
Space Separator 2610839
 
17.0%
Uppercase Letter 490014
 
3.2%
Other Punctuation 390071
 
2.5%
Decimal Number 242986
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 2062080
17.7%
t 1662308
14.3%
s 1123180
9.6%
a 868275
 
7.4%
n 768332
 
6.6%
o 646839
 
5.5%
d 611507
 
5.2%
i 605602
 
5.2%
r 597834
 
5.1%
u 517578
 
4.4%
Other values (7) 2197355
18.8%
Uppercase Letter
ValueCountFrequency (%)
Y 241123
49.2%
T 127398
26.0%
N 121493
24.8%
Decimal Number
ValueCountFrequency (%)
0 121493
50.0%
1 121493
50.0%
Space Separator
ValueCountFrequency (%)
2610839
100.0%
Other Punctuation
ValueCountFrequency (%)
, 390071
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12150904
78.9%
Common 3243896
 
21.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 2062080
17.0%
t 1662308
13.7%
s 1123180
9.2%
a 868275
 
7.1%
n 768332
 
6.3%
o 646839
 
5.3%
d 611507
 
5.0%
i 605602
 
5.0%
r 597834
 
4.9%
u 517578
 
4.3%
Other values (10) 2687369
22.1%
Common
ValueCountFrequency (%)
2610839
80.5%
, 390071
 
12.0%
0 121493
 
3.7%
1 121493
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15394800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2610839
17.0%
e 2062080
13.4%
t 1662308
10.8%
s 1123180
 
7.3%
a 868275
 
5.6%
n 768332
 
5.0%
o 646839
 
4.2%
d 611507
 
4.0%
i 605602
 
3.9%
r 597834
 
3.9%
Other values (14) 3838004
24.9%

HighRiskLastYear
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing50623
Missing (%)11.4%
Memory size869.5 KiB
False
377324 
True
 
17185
(Missing)
50623 
ValueCountFrequency (%)
False 377324
84.8%
True 17185
 
3.9%
(Missing) 50623
 
11.4%
2024-04-06T12:31:06.298940image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

CovidPos
Categorical

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing50764
Missing (%)11.4%
Memory size3.4 MiB
No
270055 
Yes
110877 
Tested positive using home test without a health professional
 
13436

Length

Max length61
Median length2
Mean length4.2912635
Min length2

Characters and Unicode

Total characters1692337
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNo
2nd rowNo
3rd rowYes
4th rowNo
5th rowNo

Common Values

ValueCountFrequency (%)
No 270055
60.7%
Yes 110877
24.9%
Tested positive using home test without a health professional 13436
 
3.0%
(Missing) 50764
 
11.4%

Length

2024-04-06T12:31:06.347815image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T12:31:06.396067image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
ValueCountFrequency (%)
no 270055
53.8%
yes 110877
22.1%
tested 13436
 
2.7%
positive 13436
 
2.7%
using 13436
 
2.7%
home 13436
 
2.7%
test 13436
 
2.7%
without 13436
 
2.7%
a 13436
 
2.7%
health 13436
 
2.7%

Most occurring characters

ValueCountFrequency (%)
o 337235
19.9%
N 270055
16.0%
e 204929
12.1%
s 191493
11.3%
Y 110877
 
6.6%
107488
 
6.4%
t 94052
 
5.6%
i 67180
 
4.0%
h 53744
 
3.2%
a 40308
 
2.4%
Other values (12) 214976
12.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1190481
70.3%
Uppercase Letter 394368
 
23.3%
Space Separator 107488
 
6.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 337235
28.3%
e 204929
17.2%
s 191493
16.1%
t 94052
 
7.9%
i 67180
 
5.6%
h 53744
 
4.5%
a 40308
 
3.4%
l 26872
 
2.3%
p 26872
 
2.3%
u 26872
 
2.3%
Other values (8) 120924
 
10.2%
Uppercase Letter
ValueCountFrequency (%)
N 270055
68.5%
Y 110877
28.1%
T 13436
 
3.4%
Space Separator
ValueCountFrequency (%)
107488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1584849
93.6%
Common 107488
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 337235
21.3%
N 270055
17.0%
e 204929
12.9%
s 191493
12.1%
Y 110877
 
7.0%
t 94052
 
5.9%
i 67180
 
4.2%
h 53744
 
3.4%
a 40308
 
2.5%
l 26872
 
1.7%
Other values (11) 188104
11.9%
Common
ValueCountFrequency (%)
107488
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1692337
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 337235
19.9%
N 270055
16.0%
e 204929
12.1%
s 191493
11.3%
Y 110877
 
6.6%
107488
 
6.4%
t 94052
 
5.6%
i 67180
 
4.0%
h 53744
 
3.2%
a 40308
 
2.4%
Other values (12) 214976
12.7%

Interactions

2024-04-06T12:30:58.093848image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.254388image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.620355image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.986934image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.349707image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.723755image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.156625image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.320509image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.680177image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.051871image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.412246image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.786524image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.215456image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.382210image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.743449image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.114839image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.476839image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.847112image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.270243image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.440008image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.803663image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.170509image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.538492image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.907530image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.332677image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.503796image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.868396image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.231044image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.600942image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.973912image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.393170image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.561816image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:56.927980image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.290054image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:57.664280image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
2024-04-06T12:30:58.034552image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Missing values

2024-04-06T12:30:58.707911image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T12:30:59.783745image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-06T12:31:02.338166image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

StateSexGeneralHealthPhysicalHealthDaysMentalHealthDaysLastCheckupTimePhysicalActivitiesSleepHoursRemovedTeethHadHeartAttackHadAnginaHadStrokeHadAsthmaHadSkinCancerHadCOPDHadDepressiveDisorderHadKidneyDiseaseHadArthritisHadDiabetesDeafOrHardOfHearingBlindOrVisionDifficultyDifficultyConcentratingDifficultyWalkingDifficultyDressingBathingDifficultyErrandsSmokerStatusECigaretteUsageChestScanRaceEthnicityCategoryAgeCategoryHeightInMetersWeightInKilogramsBMIAlcoholDrinkersHIVTestingFluVaxLast12PneumoVaxEverTetanusLast10TdapHighRiskLastYearCovidPos
0AlabamaFemaleVery good0.00.0Within past year (anytime less than 12 months ago)No8.0NaNNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNever smokedNot at all (right now)NoWhite only, Non-HispanicAge 80 or olderNaNNaNNaNNoNoYesNoYes, received tetanus shot but not sure what typeNoNo
1AlabamaFemaleExcellent0.00.0NaNNo6.0NaNNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoWhite only, Non-HispanicAge 80 or older1.6068.0426.57NoNoNoNoNo, did not receive any tetanus shot in the past 10 yearsNoNo
2AlabamaFemaleVery good2.03.0Within past year (anytime less than 12 months ago)Yes5.0NaNNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoWhite only, Non-HispanicAge 55 to 591.5763.5025.61NoNoNoNoNaNNoYes
3AlabamaFemaleExcellent0.00.0Within past year (anytime less than 12 months ago)Yes7.0NaNNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoNoCurrent smoker - now smokes some daysNever used e-cigarettes in my entire lifeYesWhite only, Non-HispanicNaN1.6563.5023.30NoNoYesYesNo, did not receive any tetanus shot in the past 10 yearsNoNo
4AlabamaFemaleFair2.00.0Within past year (anytime less than 12 months ago)Yes9.0NaNNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeYesWhite only, Non-HispanicAge 40 to 441.5753.9821.77YesNoNoYesNo, did not receive any tetanus shot in the past 10 yearsNoNo
5AlabamaMalePoor1.00.0Within past year (anytime less than 12 months ago)No7.0NaNYesNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoWhite only, Non-HispanicAge 80 or older1.8084.8226.08NoNoNoYesNo, did not receive any tetanus shot in the past 10 yearsNoNo
6AlabamaFemaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes7.0NaNNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoFormer smokerNever used e-cigarettes in my entire lifeNoBlack only, Non-HispanicAge 80 or older1.6562.6022.96YesNoNoNoNo, did not receive any tetanus shot in the past 10 yearsNoNo
7AlabamaFemaleGood0.00.0Within past year (anytime less than 12 months ago)No8.0NaNNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeYesWhite only, Non-HispanicAge 80 or older1.6373.4827.81NoNoYesYesYes, received tetanus shot but not sure what typeNoNo
8AlabamaFemaleGood0.00.0Within past year (anytime less than 12 months ago)Yes6.0NaNNoNoNoNoYesNoNoNoYesNoNoYesNoYesNoNoFormer smokerNot at all (right now)NaNWhite only, Non-HispanicAge 75 to 791.70NaNNaNNoYesNoNoYes, received tetanus shot but not sure what typeNoNo
9AlabamaFemaleGood1.00.0Within past year (anytime less than 12 months ago)Yes7.0NaNNoNoNoNoNoNoNoYesNoYesNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNaNWhite only, Non-HispanicAge 70 to 741.6881.6529.05YesNaNYesYesNo, did not receive any tetanus shot in the past 10 yearsNoNo
StateSexGeneralHealthPhysicalHealthDaysMentalHealthDaysLastCheckupTimePhysicalActivitiesSleepHoursRemovedTeethHadHeartAttackHadAnginaHadStrokeHadAsthmaHadSkinCancerHadCOPDHadDepressiveDisorderHadKidneyDiseaseHadArthritisHadDiabetesDeafOrHardOfHearingBlindOrVisionDifficultyDifficultyConcentratingDifficultyWalkingDifficultyDressingBathingDifficultyErrandsSmokerStatusECigaretteUsageChestScanRaceEthnicityCategoryAgeCategoryHeightInMetersWeightInKilogramsBMIAlcoholDrinkersHIVTestingFluVaxLast12PneumoVaxEverTetanusLast10TdapHighRiskLastYearCovidPos
445122Virgin IslandsMaleFair30.01.0Within past year (anytime less than 12 months ago)No6.06 or more, but not allNoNaNYesNoNoYesNoNoNoNo, pre-diabetes or borderline diabetesNoNoNoNoNoNoFormer smokerNever used e-cigarettes in my entire lifeYesWhite only, Non-HispanicAge 70 to 741.7870.3122.24NoNoYesNaNYes, received tetanus shot but not sure what typeNoYes
445123Virgin IslandsFemaleFair0.07.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoBlack only, Non-HispanicAge 25 to 291.9390.7224.34NoNoNoNoNo, did not receive any tetanus shot in the past 10 yearsNoYes
445124Virgin IslandsMaleGood0.015.0Within past year (anytime less than 12 months ago)Yes7.01 to 5NoNoYesNoNoNoNoNoYesYesNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoMultiracial, Non-HispanicAge 65 to 691.6883.9129.86YesYesYesYesYes, received tetanus shot but not sure what typeNoYes
445125Virgin IslandsMaleGood0.00.0Within past year (anytime less than 12 months ago)Yes5.0NaNNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoFormer smokerNever used e-cigarettes in my entire lifeNaNBlack only, Non-HispanicAge 65 to 691.6874.8426.63NaNNaNNaNNaNNaNNaNNaN
445126Virgin IslandsMaleGood0.00.0Within past 2 years (1 year but less than 2 years ago)Yes8.0None of themNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoWhite only, Non-HispanicAge 30 to 341.83104.3331.19YesNaNNoNoNaNNoYes
445127Virgin IslandsFemaleGood0.03.0Within past 2 years (1 year but less than 2 years ago)Yes6.0None of themNoNoNoYesNoNoYesNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeYesBlack only, Non-HispanicAge 18 to 241.6569.8525.63NaNYesNoNoNo, did not receive any tetanus shot in the past 10 yearsNoYes
445128Virgin IslandsFemaleExcellent2.02.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeNoBlack only, Non-HispanicAge 50 to 541.7083.0128.66NoYesYesNoYes, received tetanus shot but not sure what typeNoNo
445129Virgin IslandsFemalePoor30.030.05 or more years agoNo5.01 to 5NoNoNoNoNoNoNoNoNoNoNoNoNaNNoNoNoCurrent smoker - now smokes every dayUse them some daysNaNNaNAge 65 to 691.7049.9017.23NaNNoNoNoNo, did not receive any tetanus shot in the past 10 yearsNoNo
445130Virgin IslandsMaleVery good0.00.0Within past year (anytime less than 12 months ago)No5.0None of themYesNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNever smokedNever used e-cigarettes in my entire lifeYesBlack only, Non-HispanicAge 70 to 741.83108.8632.55NoYesYesYesNo, did not receive any tetanus shot in the past 10 yearsNoYes
445131Virgin IslandsMaleVery good0.01.0NaNYes5.0None of themNoNoNoNoNoNoNoNoNoNoNoNoYesYesNoNoFormer smokerNot at all (right now)YesBlack only, Non-HispanicAge 40 to 441.6863.5022.60YesNoNoNoYes, received tetanus shot but not sure what typeNoNo

Duplicate rows

Most frequently occurring

StateSexGeneralHealthPhysicalHealthDaysMentalHealthDaysLastCheckupTimePhysicalActivitiesSleepHoursRemovedTeethHadHeartAttackHadAnginaHadStrokeHadAsthmaHadSkinCancerHadCOPDHadDepressiveDisorderHadKidneyDiseaseHadArthritisHadDiabetesDeafOrHardOfHearingBlindOrVisionDifficultyDifficultyConcentratingDifficultyWalkingDifficultyDressingBathingDifficultyErrandsSmokerStatusECigaretteUsageChestScanRaceEthnicityCategoryAgeCategoryHeightInMetersWeightInKilogramsBMIAlcoholDrinkersHIVTestingFluVaxLast12PneumoVaxEverTetanusLast10TdapHighRiskLastYearCovidPos# duplicates
15ConnecticutMaleExcellent0.00.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 55 to 59NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN4
45LouisianaFemaleNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN4
7ColoradoMaleGood0.00.0NaNYes8.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNHispanicAge 18 to 24NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
24FloridaFemaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes8.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 65 to 69NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
62MinnesotaFemaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 60 to 64NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
89New YorkMaleExcellent0.00.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 50 to 54NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
92New YorkMaleGood0.00.0Within past year (anytime less than 12 months ago)Yes8.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNHispanicAge 40 to 44NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
100OhioMaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes7.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 45 to 49NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
104South CarolinaMaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes8.01 to 5NoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNWhite only, Non-HispanicAge 75 to 79NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3
114TexasMaleVery good0.00.0Within past year (anytime less than 12 months ago)Yes8.0None of themNoNoNoNoNoNoNoNoNoNoNaNNaNNaNNaNNaNNaNNaNNaNNaNHispanicAge 18 to 24NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3